Running Head: A Generative Theory of Categorization How Causal Knowledge Affects Classification: A Generative Theory of Categorization

نویسندگان

  • Bob Rehder
  • Shin Woo Kim
چکیده

Several theories have been proposed regarding how causal relations among features of objects affect how those objects are classified. The assumptions of these theories were tested in three experiments which manipulated the causal knowledge associated with novel categories. There were three results. The first was a multiple cause effect in which a feature’s importance increases with its number of causes. The second was a coherence effect in which good category members are those whose features jointly corroborate the category’s causal knowledge. These two effects can be accounted for by assuming that good category members are those likely to be generated by a category’s causal laws. The third result was a primary cause effect, in which primary causes are more important to category membership. This effect can also be accounted by a generative account with one additional assumption: that categories often possess hidden generative causes. A Generative Theory of Classification 3 How Causal Knowledge Affects Classification: A Generative Theory of Categorization A ubiquitous tendency of human cognition is to perceive objects and events as examples of types, kinds, or categories rather than as completely novel entities. As a result, one of the longstanding goals in cognitive research has been to uncover how humans conceive of and acquire knowledge about categories. On the one hand, it is clear that much of what we know about types of objects—that dogs bark, that lemons are sour, and that stoves can burn one’s fingers—is the result of first-hand experience in which we observe how features of objects and events covary with one another and the labels used to refer to those objects. However, it is also clear that people possess various kinds of theoretical, explanatory, or causal knowledge about categories that they do not observe directly. For example, we not only know that birds have wings, fly and build nests in trees, but also that they build nests in trees because they can fly and fly because they have wings; not only that automobiles have gas, spark plugs, and produce carbon monoxide, but also that gas and spark plugs interact to produce the carbon monoxide. Research has demonstrated that theoretical knowledge like this influences people’s performance on a variety of tasks. Categories are learned faster to the extent they are consistent with the learner’s existing theoretical knowledge (Lien & Cheng, 2000; Murphy & Allopenna, 1994; Rehder & Ross, 2001b; Waldmann, Holyoak, & Fratianne, 1995; Wattenmaker, Dewey, Murphy, & Medin, 1986). Such knowledge influences how features of one category are projected to another, or generalized to a superordinate category (Hadjichristidis, Sloman, Stevenson, & Over, 2004; Heit & Rubinstein, 1994; Medin, Coley, Storms, & Hayes, 2003; Rehder, in press-b; Rehder & Hastie, 2004; Sloman, 1994). It also influences how one infers the presence of unobserved features in category members (Rehder & Burnett, 2005). Finally, theoretical knowledge affects how objects are classified into their correct category (Ahn, 1998; Rehder, 2003a, 2003b; Rehder & Hastie, 2001a; Sloman, Love, A Generative Theory of Classification 4 & Ahn, 1998; Wisniewski, 1995). This article addresses how one type of theoretical knowledge—causal knowledge—influences the classification of objects. There are a number of ways that causal knowledge can be integrated into the mental representation of a category. One often has knowledge of the causal relations that a category participates in with other categories (e.g., that mosquitoes can cause malaria, or that having HIV can be caused by unsafe sex; see Lien & Cheng, 2000). Causal knowledge may relate a category to its constituent features, such that one can infer the former from the latter (e.g., given properties of a balloon “was stretched first” or “blown up by an adult,” one can infer membership in the category “inflated,” see Pazzani, 1991). The current study is concerned with the causal knowledge that can relate the features of a category directly to one another (such as birds flying because they have wings, and building nests in trees because they fly), and how that knowledge influence the way objects are classified. This article is organized as follows. The first section below summarizes the existing data on how interfeature causal links affect classification. The next section presents three theories which have been proposed to account for these data, and evaluates their success at doing so. Three new experiments are then presented providing new tests of the effect of causal knowledge on classification. The theories are then reevaluated in light of the new experimental results. Causal Knowledge and Classification: Current Evidence There have been two basic approaches to the study of interfeature causal relations on classification. One involves the testing of natural categories, and uses a two-step process in which causal relations are first elicited from participants, and then the influence of features on classification as a function of their role in the system of causal relations is assessed. For example, Sloman et al. (1998) presented participants with features of everyday categories like birds, apples, and chairs written on paper, and asked them to draw “dependency relations” between those features. Dependency relations were intended to subsume several types of asymmetric relations, causal relations among them (an effect feature “depends on” its cause). A Generative Theory of Classification 5 They found, for example, that most people believe that flying depends on wings (for robins), that being juicy depends on having skin (for apples), that being comfortable depends on having a back (for chairs). To assess the importance of a particular feature, participants were presented with objects without that feature but which otherwise possessed all of the other typical features of a category (e.g., a robin without wings), and were asked to make a number of judgments related to category membership (also see Ahn, 1998; Kim & Ahn, 2002). Although these studies have yielded informative and interesting findings, one drawback is that the use of natural stimuli introduces potential confounds that complicate the interpretation of the results. For example, features that vary in their causal role might also differ in the frequency in which they appear in category members, and it is well known that more prevalent features provide stronger evidence of category membership (Rosch & Mervis, 1975). To control for factors such as these, a second technique has been to test artificial (but realistic sounding) categories with which participants have no prior experience, and to teach them interfeature causal relations as part of the experimental session. For example, Figure 1 presents three different ways that features of artificial categories have been linked with causal relationships. In the common-cause network (Figure 1A), one feature, F1, is described as the cause of the three other features, F2, F3, and F4. In the common-effect network (Figure 1B), feature F4 is described as being caused by the three other features. In the chain network (Figure 1C) feature F1 causes F2, which in turn causes F3, which causes F4. Throughout this article we will refer to features that have no explicit causes (e.g., F1 in the common cause and chain networks) as primary causes and those which have no effects (e.g., F4 in the common effect and chain networks) as peripheral features. Features with both causes and effects (e.g., F2 and F3 in the chain network) will be referred to as intermediate features. Table 1 presents the features for one of artificial categories from a number of my own studies which have been used to instantiate the causal networks in Figure 1. For example, in Rehder (2003a) participants were told that Myastars (a type of star) are typically made up of ionized helium, although some are made up of normal helium. They were told that the other A Generative Theory of Classification 6 typical features of Myastars are that they are unusually hot (for a star), are very dense, and have a large number of planets. Table 1 also presents examples of the fabricated (but plausible) causal relations between those features. (Whereas the three causal networks shown in Figure 1 require only four features, Table 1 includes a fifth feature, and a causal relation involving that feature. This additional feature and causal relation will be used in the upcoming experiments.) Each causal relation not only specified the cause and effect feature (e.g., “Ionized helium causes the star to have high density.”), but also a small amount of mechanistic detail regarding why the causal relationship held. The question of how interfeature causal relations influence classification can be divided into two subquestions. The first is how causal knowledge affects the influence of individual features on classification. The second concerns how certain combinations of features make for better category members. There is precedent in categorization research for considering these two different types of effects. For example, Medin and Schaffer (1978) distinguished independent cue models, in which each feature provides an independent source of evidence for category membership, from interactive cue models, in which a feature’s influence depends on what other features are present. Prototype models and exemplar models have been the most prominent example of independent and interactive cue models, respectively, with the well-known result that exemplar models generally provide a better description of human classification (Medin et al., 1978; Nosofsky, 1986). Of course, whereas the majority of categorization studies have been concerned with how features directly observed in category members influence (independently or interactively) classification, the current study is concerned with the effect of interfeature causal knowledge. To assess the independent and interactive effect of features, in Rehder (2003a; 2003b) participants were first instructed on an artificial category with causal knowledge (like Myastars), and then performed a classification test in which they rated the category membership of each of the 16 exemplars which can be formed from 4 binary dimensions. These ratings were then predicted with linear regression equations with predictors A Generative Theory of Classification 7 representing features and interactions among features. There were four predictors fi which coded whether feature Fi was present or absent. The regression weight estimated for predictor fi indicates the influence that feature Fi has on category membership ratings. There was also six predictors fij which coded the 2-way interaction between the six unique pairs of four features. The regression weight estimated for predictor fij indicated the influence that features Fi and Fj being in agreement with one another has on ratings. (The interpretation of these twoway interactions is discussed further below.) Triple and higher-order interactions can also be included in the regression equation. Note that purpose of using linear regression was not to advance it as a psychological model of classification, but rather as an analytical tool to characterize the results in theoretically neutral terms. The weights for features and 2-way interaction terms averaged over participants are presented in Figures 2A, 2B, and 2C, for the common cause, common effect, and chain networks, respectively. These figures demonstrate that causal relations have a complex set of effects on judgments of category membership. The findings can be divided into four different types of effects, which are now summarized. Result 1: Feature weights. The regression weights associated with the four individual features are presented in left side of Figures 2A, 2B, and 2C. (Superimposed on the empirical data in Figure 2 are the predictions of one particular theory, discussed below.) As discussed, the weight for a feature indicates the influence it had on category membership ratings. The figures show that the relative influence of features varied as a function of the category’s causal network. For example, when a category’s features were related in a common-cause network (Figure 1A), the regression weight for feature F1 (the “common cause”) was 8.2. That is, when F1 was present (e.g., when a star had ionized helium), categorization ratings were on average 16.4 points higher (on a 100-point scale) than when it was absent (e.g., a star with normal helium). In comparison, the average regression weights for the other three features were 2.0, indicating that the presence or absence of those features produced only a 4.0 point swing in ratings. A Generative Theory of Classification 8 A different pattern of results emerged when features were arranged in a common effect network (Figure 1B). In that condition, it was feature F4 (the “common effect”) rather than F1 which was the most heavily weighed feature (a regression weight of 9.0 as compared to an average weight of 2.5 for features F1, F2, and F3). This result indicates that a features’ influence on categorization can change when their role in a causal network changes. Finally, for the chain network (Figure 1C), it was the primary cause F1 which was the most heavily weighed feature (as it was in the common cause network). However, note that results obtained for causal chains have not always been consistent over experiments and studies. For example, Ahn et al. (2000) have argued that not only is the first feature in a threefeature chain the most heavily weighed features, the intermediate feature is more heavily weighed than the most peripheral feature. To firmly establish the effect of causal chains on feature weights, a number of three-element chains will be tested in the experiments which follow. Result 2: Directly-linked features. As mentioned, an advantage of regression analyses is that they allow as assessment of not only individual features weights, but also interactions between features. The weights for six two-way interaction terms for the networks in Figure 1 are presented in the right-hand side of Figures 2A, 2B, and 2C. These figures show that the pattern of interactions between features also varies depending on a category’s causal network. For example, in the common cause condition the regression weight for predictor f12 (which codes the 2-way interaction between F1 and F2) was 5.1. That is, when features F1 and F2 were both present or both absent (e.g., when a star had both ionized helium and high density, or had normal helium and normal density), categorization ratings were on average 10.2 points higher than when one of these feature was present and the other absent. This result can be interpreted as participants taking into account whether an object’s features confirmed, or corroborated, the category’s causal law between F1 and F2—ratings were higher when cause and effect were consistent with one another (both present or both absent), and lower when they were inconsistent (cause present and effect absent, or vice versa). In fact, examination of Figures A Generative Theory of Classification 9 2A-C indicates that whenever a pair of features was linked with a causal relation, a substantial positive weight on the interaction between those features was observed (pairs F1F2, F1F3, and F1F4 in the common-cause network, F1F4, F2F4, and F3F4 in the common-effect network, and pairs F1F2, F2F3, and F3F4 in the chain network). Apparently, good category members are those that corroborate a category’s laws. Result 3: Indirectly-correlated features. The right-hand side of Figures 2A, 2B, and 2C also indicates that many of the other 2-way interaction terms had significantly positive weights. This included pairs F2F3, F2F4, and F3F4 in a common-cause network, and F1F3, F2F4, and F1F4 in a chain network. However, pairs F1F4, F2F4, and F3F4 in a common effect network did not differ significantly from zero. How should these results be understood? An interpretation of both Results 2 and 3 can be reached by considering the pattern of correlations that should hold between variables that are connected via the causal networks shown in Figure 1. First, in a common cause network, variables which are directly related should of course be correlated. But in addition, variables which are effects of a common cause should be correlated with one another (albeit not as strongly as those correlations between directly-linked pairs). For example, if a disease causes three symptoms, one expects the symptoms to be correlated with one another (when a person has one symptom, the probability that they have additional symptoms increases). The twoway interactions in Figure 2A are consistent with this pattern of expected correlations: large two-way interactions between directly-linked feature pairs (Result 2), and smaller interactions between pairs that are not directly linked but should be correlated. A similar explanation holds for the two-way interactions for the common-effect and chain networks. When variables are connected in a common effect network, there should be large correlations between directly-linked variables, but no correlations between the causes (because the causes are independent). This set of correlations is reflected in the pattern of two-way interactions in Figure 2B. Finally, when variables are connected in a chain, there should be correlations between directly-linked variables, and weaker correlations between A Generative Theory of Classification 10 variables that are not directly connected. Again, this set of expected correlations is mirrored in the two-way interactions in Figure 2C. Result 4: Higher-order interactions. As mentioned, regression analyses of categorization ratings can also include higher-order interactions between features. For example, Rehder (2003a) found that features connected in a common effect network exhibited three-way interactions involving the common effect and two of its causes (not shown in Figure 2B). This result can be interpreted as a discounting effect in which the weight of a cause feature depends on whether the presence of the common effect F4 has already been “explained” by the presence of one of the other causes. For example, for those common effect participants who learned Myastars, the categorization ratings for a star with a large number of planets (F4) were low when none of the purported causes of that feature were present (i.e., the star had normal helium, normal temperature, and normal density), increased substantially when one of those causes was present, but then increased only slightly more when a second cause was added. In summary, the findings just reviewed indicate that the presence of causal relations between features has a large and complex effect on how objects are classified. Importantly, this effect is not limited to just making certain features independently more or less influential (Result 1). It extends to how features interact in a way that certain combinations of features make for better or worse category members (Results 2-4). we will hereafter refer to the effect that causal knowledge has on interactions among features as the coherence effect, indicating that good category members are those whose features cohere with respect to a category’s causal laws. Causal Knowledge and Classification: Current Theories We now present three models of how interfeature causal relations affect classification, and evaluate the success those models have had in accounting for the results just presented. Causal-Status Hypothesis The first proposal considered is the causal status hypothesis (Ahn, 1998; Ahn et al., A Generative Theory of Classification 11 2000; Sloman et al., 1998), hereafter referred to as CSH. According to CSH, features are more important to category membership (i.e., are more conceptually central, or less mutable) to the extent they are “deeper” in a category’s network of causal relations. According to CSH, more causal features may be more important because they are perceived to support stronger inferences to other features (one infers effects from causes more readily than causes from effects, Ahn et al, 2000). As a result, a feature’s importance (its centrality), will increase as a function of the number of dependents (i.e., effects) it has. This includes both its direct dependents (i.e., features it directly causes), and its indirect dependents (features that are in turn caused by the features it causes) (Sloman et al. 1998). One important property of CSH is that it has been given a precise computational implementation. According to CSH, feature centrality can be computed by applying the iterative equation ci,t+1 = ∑dijcj,t (1) where ci,t is the centrality of feature i at iteration t and dij is the strength of the causal link between features j and i. This equation converges in a finite number of iterations. For example, when ci is initialized to 1 and each causal link strength is set to 2, the centralities for F1, F2, F3, and F4 for a common cause network (Figure 1A) converge to 6, 1, 1, and 1. Feature F1 is weighed most heavily because it has three dependents (F2, F3, and F4), which in turn have no dependents themselves. For the same parameters, the weights for a common effect network (Figure 1B) converge to 2, 2, 2, and 1. That is, the F1, F2, and F3 (which each have one dependent, F4) should be more heavily weighed than F4 itself (which has none). Finally, with these parameters the weights for a chain network converge to 8, 4, 2, and 1. Weights decrease monotonically along the chain because F1, F2, F3, and F4 have three, two, one, and zero dependants, respectively. CSH's ordinal feature weight predictions are presented schematically in the second column of Figure 3. These predicted changes to features weights brought about by causal knowledge have been referred to as the causal status effect: centrality decreases as one moves from deeper to more peripheral features. A Generative Theory of Classification 12 How do CSH's predictions fare against the empirical data in Figure 2? First, for the common cause network it correctly predicts that the common cause is weighed more heavily than its effects. Second, CSH's predictions are incorrect for a common effect network, because it is the common effect itself rather than its causes which is most heavily weighed. Third, the results are equivocal for the chain network, because whereas it predicts the greater weight on F1, it also incorrectly predicts that F2 should be greater than F3 which in turn should be greater than F4. Recall, however, that although this pattern does not obtain in the data shown in Figure 2C, Ahn et al. (2000) have presented evidence for three-element chains they interpret as indicating monotonically decreasing feature weights (We will return to this interpretation in the General Discussion section). For now, it is useful to distinguish a causal status effect (monotonically decreasing feature weights), from a more limited primary cause effect in which only primary causes are weighed more heavily. The results presented in Figure 2C only provide support for a primary cause effect. Fourth and finally, CSH fails to make any predictions regarding the effect of causal knowledge on feature interactions. Thus, CSH fails to predict the pattern of two-way interactions between directly linked and indirectly correlated feature pairs shown in the righthand side of Figure 2 (Results 2 and 3 above). CSH also fails to predict any effects involving higher-order interactions (Result 4). In other words, CSH fails to provide any account of the effect of feature coherence on judgments of category membership. Is there any way to rescue CSH from these empirical failures? At least regarding the common effect (mis)predictions, Ahn et al. (2000) have proposed that a causal status effect obtains all else being equal, but that this effect can be overwhelmed by other factors which also influence feature importance, one of which we now describe. Relational Centrality Hypothesis The second proposal will be referred to here as the relational centrality hypothesis, hereafter RCH (Ahn et al., 2000; Rehder et al., 2001a). Whereas CSH predicts that a feature becomes more important to the extent it has many effects, RCH predicts that its importance A Generative Theory of Classification 13 depends on the number of causal relationships it enters into, regardless of its role as cause or effect. On this view, the fact that features involved in many relations are important to analogical reasoning (Gentner, 1989) carries over to classification. The ordinal predictions that RCH makes for feature weights for the three networks in Figure 1 are presented in the third column of Figure 3. On the positive side RCH predicts that F1 in the common cause network, and F4 in the common effect network, should be the most heavily weighed features because both are involved in three causal relationships as compared to only one for the other features. These predictions are borne out in the empirical data (Figures 2A and 2B). Unfortunately, for causal chains RCH predicts that intermediate features F2 and F3 (which are involved in two causal relations) should be the more heavily weighed than features F1 and F4 (which are involved in only one). People in contrast put the most weight on the primary cause F1 (Figure 2C). Nevertheless, the greater weight on F1 could be accounted for by assuming that feature weights are sensitive to both number of dependents (as specified by CSH) and number of relations (RCH). This hybrid theory (hereafter CSH/RCH) also explains the common effect results which are so troublesome for CSH alone. According to CSH/RCH, cause features are more important all else being equal, but this effect can be overturned when an effect feature is involved in multiple relations, such as the common effect feature in Figure 1B. Of course, both RCH and CSH/RCH still predict that intermediate features of a causal chain should be more heavily weighed than the peripheral feature (because they have both more dependents and more relations), a result which does not obtain in Figure 2C. But as mentioned the empirical status of this result is controversial, and thus CSH/RCH would remain viable if conclusive evidence for this effect was found. A more serious objection is that RCH, like CSH, offers no account of the coherence effect, and if neither CSH nor RCH alone account for the coherence effect then neither does CSH/RCH. The importance of providing an account of the coherence effect leads to the third theory under consideration. A Generative Theory of Classification 14 Causal-Model Theory The third model is referred to as causal-model theory, or CMT (Rehder, 2003a; 2003b). According to CMT, interfeature causal relations are represented as probabilistic causal mechanisms, and classifiers consider whether an object is likely to have been produced or generated by those causal mechanisms. Objects that are likely to have been generated are considered to be good category members, whereas those unlikely to be generated are poor category members. (A formal description of CMT is provided in Appendix B.) In some respects, CMT is the most successful of the three theories, because only it has been shown to provide good quantitative fits to the empirical data presented in Figure 2. Rehder (2003a; 2003b) fit CMT to the subjects' classification ratings in the common cause, common effect, and chain conditions and then subjected the predicted ratings to the same multiple regressions that were run on participants' ratings. The resulting predicted regression weights are presented in Figure 2 superimposed on the empirical weights. Especially notable is the fact that CMT is unique among models in predicting the coherence effect, because good category members are those likely to have been generated by probabilistic causal mechanisms, and thus are those that exhibit expected correlations between causally related features. As discussed, a population of category members generated by a causal network should exhibit correlations between directly linked feature pairs (Result 2). In addition, the effect features in a common cause network and the indirectly connected features of a chain network should be more weakly correlated (Result 3). The right hand side of Figure 2 indicates that CMT is able to reproduce all of these effects. Moreover, Rehder (2003a) has shown that CMT provides a good quantitative account of the pattern of higher-order interactions found for common effect networks (Result 4). In addition to the accounting for the coherence effect, the left hand side Figure 2 indicates that CMT also provides good fits to the feature weights for all three causal networks. However, as is the case for assessing the fits of any quantitative model, it is important to distinguish which aspects of these fits are predicted by CMT on an a priori basis (that is, in a A Generative Theory of Classification 15 principled manner) from those that it accounts for simply by adjusting free parameters. In general, CMT predicts that feature weights will reflect the empirical frequency of features, and thus that those weights will be the same (all else being equal) when features have the same frequency (or when no frequency information is available). But all is not equal when causal relations are present, because CMT also predicts that the influence that a feature has on judgments of category membership should increase as a function of the number of causes it has. It predicts this because a feature will be more prevalent among category members to the extent that there are more causes present to generate it, and more prevalent features will in turn be more diagnostic of category membership (all else being equal). In other words, in contrast to CSH (which stresses the importance of a feature’s number of dependents) and RCH (which stresses its total number of relations), CMT predicts a multiple cause effect in which a feature becomes more important as a function of its number of causes. CMT's ordinal feature weight predictions are presented schematically in the last column of Figure 3. How do these a priori predictions hold up against the empirical data in Figure 2 (its good quantitative fits notwithstanding)? On the positive side, CMT correctly predicts that feature F4 in the common effect network is the most heavily weighed feature, because it has three causes which generate it as compared to zero for the other three features. On the negative side however, Figure 3 indicates that for the common cause and chain networks CMT predicts that feature F1 (with zero causes) should be weighed less heavily than the rest of the features (which each have one cause). That is, CMT does not predict the primary cause effect found with those two networks. Of course, the fits presented in Figures 2A and 2C indicate that CMT is able to reproduce the primary cause effect, but it does so by adjusting a free parameter that represents the importance of a primary cause in a causal network. Clearly, the important issue is why primary causes are weighed more heavily, and on this question CMT is silent. Thus despite its good quantitative fits, CMT is also deficient as a full explanation of the effects of causal knowledge on categorization. Is there any way to rescue CMT from this objection? Rehder (2003b) has proposed A Generative Theory of Classification 16 that primary causes can be weighed more heavily when a category is believed to possess an underlying cause which generates them, in the same way that an invisible disease (e.g., a virus) can generate symptoms which in turn generate other symptoms. Because primary symptoms are more reliably generated, they are more diagnostic of category membership than other symptoms. Note that although this proposal predicts, like CSH, that primary causes are more important, it differs in locating the source of that effect in a hidden generative cause rather than in a primary cause’s greater number of dependents. As it turns out this distinction will be crucial in accounting for the empirical results presented below. Summary and Overview of Experiments In summary then, none of the current theories provide an entirely satisfactory account of the effects of causal knowledge on categorization. CSH and RCH alone each provide only a partial account of the effect on feature weights, and even together provide no account at all of the importance of coherence among features. CMT has the advantage that it predicts the coherence effect and provides good quantitative fits to the data. But even it does not provide a complete explanation for why causal knowledge changes features weights the way it does. The purpose of this article is to arrive at a comprehensive account of the effects of causal knowledge on classification. The fact that each theory accounts for some aspects of the results suggests that causal knowledge may have a multi-faceted influence on categorization, with the principles embodied in two or more theories each exerting an influence. But on the basis of the current evidence it is not possible to evaluate such hybrid accounts (like CSH/RCH, or CMT augmented with an assumption of a hidden cause) because the systematic manipulations needed to test the principles embodied in each theory have yet to be carried out. Accordingly, three new experiments are presented which provide such tests, with the goal of identifying the set of principles that together provide a complete account of the effects of causal knowledge on classification. Experiments 1-3 address the four following questions. The first concerns how a feature’s weight is affected by its direct causal relations. Does it increase with its number of effects (as predicted by CSH), its number of causes (as A Generative Theory of Classification 17 predicted by CMT), or both (as predicted by RCH)? To answer this question, across the experiments a feature’s number of causes and effects is systematically manipulated. The second question concerns how a feature’s weight is influenced by its indirect dependents. To test CSH’s prediction that it should increase as its number of indirect dependents increases, Experiments 1 and 3 manipulate a feature’s number of indirect dependents while holding other factors constant. The third question concerns how the features arranged in a causal chain are weighed. As a consequence of CSH’s claim regarding the importance of a feature’s number of dependents, it predicts that weights will decrease monotonically along a causal chain. But while some studies have found support for this prediction (e.g., Ahn et al., 2000) others have only found evidence for a primary cause effect (Rehder, 2003b). Experiments 1-3 will test three feature causal chains to determine whether that network exhibits a full causal status effect (monotonically decreasing feature weights) or only a primary cause effect. The final question concerns the generality of the coherence effect. Each experiment will provide a test of CMT’s predictions that feature interactions will reflect the pattern of correlations that one expects to be generated by a network of causally-related variables. With the results of these experiments in hand, it will be possible to identify the set of principles which, operating in tandem, provide a complete account of the effect of interfeature causal relations on classification. Experiment 1 In Experiment 1 each participant learned two novel categories (one after the other), each with five binary features and causal relations among those features. For one of the categories, the causal relations took on the topology labeled 2-1-2 in Figure 4A. In this network, features F1 and F2 are primary causes, feature F3 is an intermediate feature, and features F4 and F5 are peripheral features. The network in Figure 4A is labeled 2-1-2 just because it has two primary causes, one intermediate feature, and two peripheral features. The other category which participants learned exhibited one of the topologies labeled A Generative Theory of Classification 18 1-1-1 in Figure 4B. In the network on the left side of Figure 4B (seen by half the participants) the primary cause is F1 and the peripheral feature is F4. In the network on the right (seen by the other half) the primary cause is F2 and the peripheral feature is F5. The networks in Figure 4B are labeled 1-1-1 because they have one primary cause, one intermediate feature, and one peripheral feature. After learning about a category and its causal relations, participants were presented with a series of potential category members and asked to generate categorization ratings for each. Those ratings were then analyzed with multiple regression to assess the weight of an individual features and two-way interactions between features. The ordinal predictions that CSH, RCH, and CMT make for feature weights for the 2-1-2 and 1-1-1 networks are presented in Figure 5. Each panel presents the predicted regression weights for primary causes (PC), feature F3, and peripheral features (Per). First consider the predictions of CSH in the second column of Figure 5. As usual, CSH predicts a monotonic decrease in feature weights for both the 2-1-2 and 1-1-1 networks: primary causes should be more heavily weighed than F3, which in turn should be more heavily weighed than the peripheral features. It predicts this because in each network the primary causes have the most dependents and the peripheral features have the fewest (zero). Although assessing the relative importance of features within a network in this manner can be informative, these predictions must be treated with caution because the comparisons involve different features. As a consequence, observed differences in relative feature importance may be affected by features' differential salience, or by the order in which they are presented on the computer screen. For this reason, the critical tests in Experiment 1 involve comparing the same feature in the 2-1-2 and 1-1-1 networks in order to determine how its weight varies as a function of its number of causes and effects. There are two important comparisons embodied in the 2-1-2 and 1-1-1 networks. First, because in the 2-1-2 network the primary causes have two indirect dependents (F4 and F5) versus just one in the 1-1-1 networks (F4 or F5), CSH predicts that primary causes should be more heavily weighed in the A Generative Theory of Classification 19 2-1-2 network as compared to the 1-1-1 network. Second, CSH predicts that feature F3 should become relatively more important in the 2-1-2 network, in which it has two effects, as compared to the 1-1-1 network, in which it has just one. These predictions are depicted in the last row of Figure 5 which shows the difference between feature weights in the two conditions. To give a quantitative example, when ci in Eq. 1 is initialized to 1 and each causal link strength is set to 2, the weights of the primary causes, feature F3, and peripheral features yielded by Eq. 1 are 8, 4, and 1 for the 2-1-2 network, and 4, 2, and 1 for the 1-1-1 network. The differences between these feature weights (4, 2, and 0) confirm the relatively greater importance of the primary causes and F3 in the 2-1-2 network. Note that these comparisons are enabled by use of the two versions of the 1-1-1 network shown in Figure 4B: By averaging over the two versions any potential effects of the primary cause being F1 or F2, and the peripheral feature being F4 or F5, were eliminated. Turning now to RCH, the third column of Figure 5 indicates that RCH predicts a very different pattern of feature weights. Whereas CSH predicts that primary causes should be most heavily weighed, RCH predicts that it should be feature F3 instead, because in both networks it is involved in the largest number of causal relationships. Moreover, the last row of Figure 5 indicates that, according to RCH, feature F3 should be relatively more important in the 2-1-2 condition as compared to the 1-1-1 condition, because in the 2-1-2 network it is involved in four causal relations versus just two in the 1-1-1 network. Finally, Figure 5 indicates that CMT predicts that feature F3 (with two causes) should be most heavily weighed in the 2-1-2 condition, followed by the peripheral features (with one cause each) followed by the peripheral features (with zero causes). In the 1-1-1 condition, feature F3 and the peripheral features (each with one cause) should be weighed more heavily than the primary cause. The critical prediction is that feature F3 should be relatively more important in the 2-1-2 condition as compared to the 1-1-1 condition, because in the 2-1-2 network it has two causes versus one in the 1-1-1 network. Figure 5 illustrates how the comparison between the 2-1-2 and 1-1-1 networks A Generative Theory of Classification 20 provides a strong test of each model's predictions regarding how feature importance changes as a function of the number of causes, effects, and indirect dependents. But recall that CMT also predicts that good category members are those that manifest the pattern of interfeature correlations generated by the category’s causal model. In the 2-1-2 condition there are four feature pairs that are directly causally related (F1F3, F2F3, F3F4, and F3F5), and CMT predicts that category membership ratings will be sensitive to whether correlations between those feature pairs are maintained. For example, because F1 and F3 are causally linked, it predicts that category membership ratings will be higher when F1 and F3 are both present or both absent, and lower when one is present and the other absent. In addition, in a 2-1-2 network a number of other feature pairs should be correlated (albeit more weakly). For example, features F1 and F4 should be correlated, because those features are linked indirectly via feature F3. The same is true for feature pairs F1F5, F2F4, and F2F5. Finally, features F4 and F5 should also be correlated, because they share a common cause (F3). The two 1-1-1 networks in Figure 4B each have their own unique pattern of interfeature correlations. For the 1-1-1 network in the left panel, the directly linked feature pairs F1F3 and F3F4, the indirectly linked pair F1F4, should be correlated. For the 1-1-1 network in the right panel, the directly linked feature pairs F2F3 and F3F5, and the indirectly linked pair F2F5, should be correlated. The directly linked and indirectly correlated feature pairs for the networks tested in Experiment 1 are summarized in Table 2. Method Materials. Six novel categories were tested:, two nonliving natural kinds (Myastars, Meteoric Sodium Carbonate), two biological kinds (Kehoe Ants, Lake Victoria Shrimp, and two artifacts (Romanian Rogos, Neptune Personal Computers). Each category had five binary features which were described as distinctive relative to a superordinate category. Each feature was also described as probabilistic, that is, not all category members possessed it (e.g., "Most Myastars have an unusually high density whereas some are of normal density.", "Most Myastars have an unstable gravitational field whereas some have a normal gravitational A Generative Theory of Classification 21 field.", etc.) Each causal relationship was described as one feature causing another (e.g., "High density causes the star to have an unstable gravitational field."), accompanied with one or two sentences describing the mechanism responsible for the causal relationship (e.g., "The star's high density makes its structure unstable, which causes the gravitational fluctuations."). The features of Myastars and the causal relationship between those features are shown in Table 2. A complete list of the features and causal relationships for all six experimental categories is available from the author. Participants. Forty-eight New York University undergraduates received course credit for participating in this experiment. Design. The single within-subject manipulation was the topology of the category’s causal network (2-1-2 or 1-1-1). In addition there were four between subject counterbalancing factors. First, a participant learned either the two biological kinds, the two nonliving natural kinds, or the two artifacts. Second, within those kinds which category was presented first was balanced (e.g., of the participants who learned the nonliving natural kinds, half learned Myastars first and half learned Meteoric Sodium Carbonate first). Third, half the participants learned the 2-1-2 category first followed by the 1-1-1 category, whereas this order was reversed for the other half. Fourth, when learning the 1-1-1 network, half the participants learned the specific topology on the left side of Figure 4B, and the other half learned the topology on the right side. Procedure. Experimental sessions were conducted by computer. Each participant learned two categories. For each, participants first studied several screens of information about the category at their own pace and then performed a classification test. The initial screens presented a cover story and the category’s features, and the fact that each feature occurred in “most” category members. Participants were then instructed on either four causal relationships that formed a 2-1-2 network or two which formed a 1-1-1 network. Participants also observed a diagram like those in Figure 4 depicting the structure of the causal links. A Generative Theory of Classification 22 When ready, participants took a multiple-choice test that tested them on the knowledge they had just studied. While taking the test, participants were free to return to the information screens they had studied; however, doing this obligated the participant to retake the test. The only way to pass the test and proceed to subsequent phases was to take it all the way through without errors and without returning to the initial information screens for help. During the classification test participants rated the category membership of all possible 32 exemplars that can be formed from five binary features. Features were listed in order (dimensions 1 through 5) on the computer screen. ”). Responses were entered by positioning a slider on a scale whose ends were labeled "Definitely not an X" and the right end was labeled "Definitely an X,” where X was the name of the category. The slider could be set to 21 distinct positions. Responses were scaled into the range 0-100. The order of presentation of the 32 exemplars was randomized for each participant. Experimental sessions lasted approximately 50 minutes. Results The average category membership rating given to each of the 32 test exemplars in each condition are presented in Appendix A. To determine the effect of causal network on the importance of features, and the interactions between features, those ratings were first analyzed by performing a multiple regression for each participant for each of his or her two networks. Five predictor variables (f1, f2, f3, f4, f5) were coded as –1 if the feature was absent, and +1 if it was present. Recall that the regression weight associated with each fi represents the influence that feature i had on category membership ratings. A positive weight indicates that the presence of a feature increased categorization ratings and its absence decreased it. An additional ten predictor variables were constructed by computing the multiplicative interactions between each possible feature pair: f12, f13, f14, f15, f23, f24, f25, f34, f35, and f45. These variables are coded –1 if one of the features is present and the other absent, and +1 if both are present or both absent. Recall that for those feature pairs which are directly linked or indirectly correlated, a positive interaction weight indicates that ratings are sensitive to A Generative Theory of Classification 23 whether the expected correlation is preserved (cause and effect both present or both absent) or broken (one present and the other absent). The effect of causal network on features and feature interactions are presented separately in the following two sections. Feature weights. To make the feature regression weights comparable across the 2-1-2 and 1-1-1 networks, weights other than those for F3 were aggregated according to whether the feature was a primary cause or a peripheral feature. Initial analyses of the feature weights revealed that there were no effects of the four between-subject counterbalancing variables. That is, feature weights were unaffected by whether the 2-1-2 or the 1-1-1 network was presented first, or by whether the 1-1-1 network was instantiated by the network on the left or right side of Figure 4B. In addition, feature weights did not vary as a function of whether the categories were biological kinds, nonliving natural kinds, or artifacts, or of the order in which the categories were presented. Finally, there were no interactions between network type and category, indicating that feature weights were the same regardless of which specific category manifested which type of network. The presented results are thus collapsed over these factors. Regression weights averaged over participants for the four feature types are presented for the 2-1-2 and 1-1-1 networks in Figures 6A and 6B, respectively. In the 2-1-2 condition the weight for feature F3 was greater than for the other features. For that network, F3’s weight was 8.1, indicating that, on average, categorization ratings were 16.2 points higher (on a 100point scale) when F3 was present versus when it was absent. This weight was greater than those for the primary causes (5.7, representing a swing in ratings of 11.4 points) or the peripheral features (5.3, a swing in ratings of 10.6 points). In comparison, in the 1-1-1 condition F3 was not more heavily weighed than the primary cause or the peripheral feature. Instead, an apparent primary cause effect obtained in which the primary cause (7.8) was more heavily weighed than F3 (6.6). Feature F3 in turn had approximately the same weight as the peripheral feature (6.4). The different pattern of weights in the two conditions depicted in Figure 6C, which shows the difference in weights between the two conditions. The figure indicates that feature F3 but not the primary causes were A Generative Theory of Classification 24 weighed relatively more heavily in the 2-1-2 condition as compared to the 1-1-1 condition, a result predicted by RCH and CMT but not CSH (Figure 5). A 2x3 repeated measures ANOVA was conducted on the feature weights where the two factors were network (2-1-2 or 1-1-1) and features type (primary cause, F3, or peripheral feature). An interaction between network and feature type, F(2, 94) = 8.13, MSE = 10.4, p < .001 confirmed that the pattern of weights differed for the two networks. To test the specific prediction (made by CSH, RCH, and CMT) that feature F3 would have be relatively more important in the 2-1-2 network, its weight was compared to that of the peripheral features (which have the same number of causes and effects in the two conditions). A 2x2 ANOVA where the factors were network and the contrast between F3 and the peripheral features revealed a significant interaction, F(1, 47) = 12.61, MSE = 37.2, p < .001, confirming the relatively greater importance of F3 in the 2-1-2 condition. A separate analysis of feature F3 alone revealed that its weight was also greater in absolute terms in the 2-1-2 condition (8.1 vs. 6.6), F(1, 47) = 4.20, MSE = 13.1 p < .05. To test the specific prediction (made by CSH) that the primary causes would have a relatively greater weight in the 2-1-2 network (because of their greater number of indirect dependents), a 2x2 ANOVA was conducted where the factors were network and the contrast between the primary causes and the peripheral features. The interaction was not significant in this analysis, F(1, 47) = 1.53, MSE = 8.4 p > .20, indicating that primary causes were not relatively more important in the 2-1-2 condition. Note that the difference between primary causes and peripheral features in the 2-1-2 condition (5.7 – 5.3 = 0.4) was smaller than the corresponding difference in the 1-1-1 condition (7.8 – 6.4 = 1.2), an effect in the opposite direction than predicted by CSH. Finally, a separate analysis of the 1-1-1 condition revealed that despite the elevated weight on the primary cause, the overall effect of feature type did not reach significance, F(2, 94) = 2.21, MSE = 12.8, p = .11, nor did the more focused contrast between the primary cause and feature F3, F(1, 47) = 2.60, MSE = 14.0, p = .11. The contrast between F3 and the A Generative Theory of Classification 25 peripheral feature did not approach significance F < 1. Feature interactions. To make the two-way interactions among features comparable across causal networks, those interactions were aggregated according to whether they were between features that were directly linked or indirectly correlated (Table 2). As was the case for feature weights, there were no effects of the four between-subject counterbalancing factors on the feature interactions, and the results are thus presented collapsed over these factors. The two types of interaction weights averaged over participants are presented in Figures 6D and 6E. As predicted by CMT but not CSH or RCH, in both the 2-1-2 and 1-1-1 conditions the interactions between directly linked or indirectly correlated features were positive. This result indicates that category membership ratings were higher when two features which should be correlated were either both present or both absent, and lower when one was present and the other absent. For example, in the 1-1-1 condition the average regression weight on directly linked features was 3.07, indicating that, all else being equal, categorization ratings were 6.1 points higher when members of a directly-linked feature pair were both present or both absent, as compared to when one was present and the other absent. Also as predicted, the magnitude of the interactions between indirectly-correlated features were less than the directly-linked features. These results support the view that exemplars are good category members to the extent they manifest the pattern of interfeature correlations one expects to be generated by a category’s causal model. These results were also supported by statistical analysis. First, in both conditions, each type of interaction was significantly greater than 0, p’s < .0001, confirming that ratings were sensitive to whether exemplars manifested the expected pattern of interfeature correlations. Next, a 2x2 repeated measures ANOVA was conducted on the feature weights where the two factors were network (2-1-2 or 1-1-1) and interaction type (directly linked or indirectly correlated). There was a main effect of interaction type, F(1, 47) = 16.37, MSE = 3.54, p < .001, confirming that directly-linked interactions were greater than indirectly-correlated ones. The interaction between network and interaction type did not reach significance (F < 1). A Generative Theory of Classification 26 Selected exemplars. The preceding two sections document how the importance of individual features and combinations of features to classification judgments is affected by causal knowledge. Figure 7 presents how these changes are manifested in the ratings of individual exemplars. Figure 7A presents the average ratings in the 2-1-2 condition for exemplars which possess all five typical features except that they are missing either one of primary causes, feature F3, or one of the peripheral features; Figure 7B presents the same for the 1-1-1 condition. Figure 7A indicates that an exemplar missing just F3 is a worse category member than one missing just a primary cause or just a peripheral feature. According to the regression analyses just presented, there are two reasons for this. First, feature F3 is weighed more heavily than the other features (Figure 6A). Second, an exemplar missing only feature F3 breaks all four of the direct correlations expected to be produced by a 2-1-2 network: Between F1 and F3, F2 and F3, F3 and F4, and F3 and F5 (Figure 6D). In contrast, Figure 7B indicates that an exemplar missing just F3 is not a worse category member than the other exemplars, reflecting the different regression weights found in the 1-1-1 condition. Discussion The results from Experiment 1 have several implications for the three models under consideration. First, the predictions of CSH were generally not supported by this experiment. CSH predicts that in the 2-1-2 condition primary causes should be more influential than feature F3, which in turn should be more influential than the peripheral feature. Instead, it was feature F3 which was the most heavily weighed feature. CSH predicts the same monotonic decrease in feature weights for the 1-1-1 network, but although the data revealed a (marginally significant) primary cause effect, there was no difference between the intermediate and peripheral features. Most importantly, CSH failed to predict the changes in the relative weights brought about by manipulating features’ number of causes and effects. CSH incorrectly predicted that the primary causes should be relatively more important in the 2-1-2 network (in which they have two indirect dependants) as compared to the 1-1-1 network (in which they have just one), a result which was not observed. A Generative Theory of Classification 27 Second, RCH generally fared better than CSH because it correctly predicted that the intermediate feature but not the primary causes would increase in relative importance in the 21-2 network as compared to the 1-1-1 network. However, it incorrectly predicted that feature F3 would be most heavily weighed in the 1-1-1 condition. This result replicates previous finding in which intermediate features in a causal chain are not the most heavily weighed features (e.g., Figure 2C). Third, recall that a model which combines the principles of CSH and RCH, CSH/RCH, has also been proposed. However, although CSH/RCH explains why feature F3 is the most heavily weighed in the 2-1-2 condition, it still incorrectly predicts that primary causes should have relatively greater importance in the 2-1-2 network versus the 1-1-1 network. Moreover, because CSH and RCH each predicts that feature F3 should be more important than the peripheral feature in the 1-1-1 condition, CSH/RCH predicts an especially large advantage for F3. As we have seen, however, feature F3 and the peripheral feature were weighed equally in that condition. Of the theories under consideration, only CMT correctly predicted the greater weight on F3 in the 2-1-2 condition. It is also the only theory that accounted for the coherence effect, that is, the fact that good category members are those that manifest the interfeature correlations which should be generated by a causal network. On the negative side, CMT predicted that the primary causes should be weighed least heavily in both networks, a result which did not obtain. Recall however, that such within-network comparisons may not be valid because the features involved also differ in other ways (e.g., the order in which they are presented on the computer screen). We will return to this issue in Experiments 2 and 3. Experiment 2 One result from Experiment 1 is that a feature’s influence was greater when it had two causes and two effects as compared to one cause and one effect. However, it is uncertain whether this result arose because of the greater number of causes or the greater number of effects (or both). The primary purpose of Experiment 2 was to test how a feature’s importance A Generative Theory of Classification 28 changes solely as a function of its number of causes. Participants learned two categories which manifested the two network topologies shown in Figure 8 labeled 3-1-1 and 1-1-1. The 3-1-1 and 1-1-1 topologies allow an assessment of how the influence of feature F4 changes solely as a function of its number of causes, because whereas in the 3-1-1 network F4 has three causes and one effect, it has two fewer causes in the 1-1-1 networks. The predictions that CSH, RCH, and CMT make for the 3-1-1 network are shown in Figure 9. For both networks CSH predicts a monotonic decrease in feature weights (primary causes > F4 > peripheral features). Moreover, the last row of Figure 9 indicates that it predicts the relative weight of features should not differ in the two networks. It makes this prediction because in both networks the primary causes have two dependents (one direct, one indirect), feature F4 has one direct dependent, and the peripheral features have zero. Second, RCH’s predictions are the same as in Experiment 1. Not only should intermediate feature should be the most heavily weighed in both conditions (because in both it is involved in the greatest number of relations), F4 should be relatively more important in the 3-1-1 condition (in which it is involved in four causal relations) as compared to the 1-1-1 condition (in which it is involved in two). CMT qualitative predictions are also the same as in Experiment 1. It predicts that feature F4 (with three causes) should be the most heavily weighed in the 3-1-1 condition, followed by the peripheral features (with one), and then the primary causes (with zero). In the 1-1-1 condition, F4 and the peripheral features (one cause each) should be equal and the primary causes least important. Most importantly, CMT predicts that feature F4 should be relatively more important in the 3-1-1 condition as compared to the 1-1-1 condition, because in the former network it has three causes as compared to one in the latter. Of course, CMT also predicts that category membership ratings should reflect the pattern of interfeature correlations generated by the networks shown in Figure 8. In the 3-1-1 condition there are four feature pairs that are directly causally related (F1F4, F2F4, F3F4, and F4F5), and category membership ratings should thus be sensitive to whether correlations A Generative Theory of Classification 29 between those feature pairs are maintained. In addition, in the 3-1-1 network feature pairs F1F5, F2F5, and F3F5 should be indirectly correlated (because they are connected in a causal chain via F4). For each 1-1-1 network, there are two pairs that are directly linked and one that is indirectly correlated. The directly-linked and indirectly-uncorrelated feature pairs for each causal network tested in Experiment 2 are summarized in Table 2. Method Materials. The materials used in Experiment 2 were identical to those used in Experiment 1, except for the different causal relations required to construct the causal networks in Figure 8. Participants. Seventy-two New York University undergraduates received course credit for participating in this experiment. Design. The single within-subject manipulation was the topology of the category’s causal network (3-1-1 or 1-1-1). In addition, the same four between subject counterbalancing factors used in Experiment 1 were used in this experiment. Procedure. The procedure was identical to that of Experiment 1. Results The average category membership ratings in each condition are presented in Appendix A. As in Experiment 1, multiple regressions were performed on those ratings for each participant for each of his or her 3-1-1 and 1-1-1 networks. There were no effects of the four between-subject counterbalancing factors of theoretical interest, and thus the results for feature weights and interactions are presented collapsed over these factors. Feature weights. Regression weights averaged over participants for each feature type are presented for the 3-1-1 and 1-1-1 networks in Figures 10A and 10B, respectively. The primary causes in the 3-1-1 condition are F1, F2, and F3; in the 1-1-1 condition it is F1, F2, or F3. As predicted by CMT and RCH but not CSH, the weight for feature F4 in the 3-1-1 condition (8.1) was greater than for the primary causes (5.6) and the peripheral feature (7.1). In comparison, in the 1-1-1 condition F4 was not the most heavily weighed feature. Instead, A Generative Theory of Classification 30 the weights in that condition were virtually identical (6.3, 6.5 and 6.8 for the primary cause, F4, and peripheral feature, respectively). The different pattern of weights in the two conditions depicted in Figure 10C, which shows the difference in weights between the two conditions. The figure indicates that F4 was weighed relatively more heavily in the 2-1-2 condition as compared to the 1-1-1 condition, a result predicted by RCH and CMT but not CSH (Figure 9). A 2x3 repeated measures ANOVA was conducted on the feature weights where the two factors were network (3-1-1 or 1-1-1) and features type (primary cause, F4, or peripheral feature). There was an interaction between network and feature type, F(2, 142) = 6.14, MSE = 8.9 p < .005, indicating that the pattern of weights differed for the two networks. To test the specific hypothesis that feature F4 had a relatively greater weight in the 3-1-1 network, a 2x2 ANOVA was conducted where the factors were network and the contrast between F4 and the two other feature types (which were both equated on their number of causes and effects in the two conditions). A significant interaction in this analysis confirmed the relatively greater importance of F4 in the 3-1-1 condition, F(1, 71) = 8.64, MSE = 31.3, p < .005. A separate analysis of feature F4 alone also confirmed that its weight was greater in the 2-1-2 condition as compared to the 1-1-1 condition in absolute terms, F(1, 71) = 6.89, MSE = 15.2, p < .05. A separate analysis of the 1-1-1 condition revealed no main effect of feature, F < 1. Feature interactions. The two-way interactions among features were aggregated according to whether the features were directly linked or indirectly correlated (Table 2). The two types of interaction weights averaged over participants are presented for the 3-1-1 and 11-1 networks in Figures 10D and 10E. As predicted by CMT but not CSH or RCH, in both conditions the interactions between both directly-linked and indirectly-correlated features were positive, confirming once again that categorization ratings were higher when two features which should be correlated were either both present or both absent, and lower when one was present and the other absent. Also as predicted, the interactions between indirectlycorrelated feature pairs were less than the directly-linked pairs. In both conditions, each interaction type was significantly greater than 0, p’s < .0001. A Generative Theory of Classification 31 In addition, a 2x2 repeated measures ANOVA with network (3-1-1 or 1-1-1) and interaction type (directly linked or indirectly correlated) as factors revealed an effect of interaction type, F(1, 71) = 26.67, MSE = 4.5, p < .0001, confirming that directly-linked interactions were greater than indirectly-correlated interactions. There was also an interaction between network and interaction type, F(1, 71) = 10.58, MSE = 2.2, p < .01, indicating that this difference was larger in the 1-1-1 condition as compared to the 3-1-1 condition. Discussion Once again, the predictions of CSH were not supported by this experiment. For both a 3-1-1 and 1-1-1 network, CSH predicts a monotonic decrease in feature weights. Instead, in the 3-1-1 condition it was feature F4 which was the most heavily weighed, and features were weighed equally in 1-1-1 condition. More importantly, CSH also failed to predict the changes in the relative weights brought about by the manipulating the intermediate feature’s number of causes. Whereas, CSH predicted no difference in the pattern of feature weights between the two networks, in fact feature F4 was weighed relatively more heavily in the 3-1-1 condition. RCH fared better than CSH in that it correctly predicted the relatively greater weight on feature F4 in the 3-1-1 condition. However, just as in Experiment 1, it incorrectly predicted a greater weight for the intermediate feature in the 1-1-1 condition. CSH and RCH together are also unable to explain the current results. Although CSH/RCH accounts for why feature F2 is the most heavily weighed in the 3-1-1 condition, it still incorrectly predicts that the primary causes should be more important than peripheral features in the 3-1-1 condition (because they have more dependents) and that feature F2 should be more important than peripheral features in the 1-1-1 condition (because it has both more dependents and more relations). Of the three theories, only CMT correctly predicted the relative ranking of feature weights in the 3-1-1 condition: F4 followed by the peripheral features followed by the primary causes. As in Experiment 1, it once again incorrectly predicted that the primary cause would be least important (again, with the caveat that these comparisons involve features which vary in other ways). Nevertheless, the critical successes for CMT in this experiment are that (a) it A Generative Theory of Classification 32 correctly predicted that feature F4 would be relatively more important in the 3-1-1 condition as compared to the 1-1-1 condition, and (b) it was also the only theory that accounted for the fact that good category members are those that manifest the interfeature correlations generated by a causal network, that is, the coherence effect. Experiment 3 Whereas Experiment 2 tested how a feature’s importance changes as a function of its number of causes, Experiment 3 tests how it changes as a function of its number of effects. Participants learned two categories which manifested the two network topologies shown in Figure 11. The 1-1-3 and 1-1-1 networks allow an assessment of how the influence of feature F2 changes with its number of effects, because whereas in the 1-1-3 network F2 has one cause and three effects, it has two fewer causes in the 1-1-1 networks. A second purpose of the 1-1-3 and 1-1-1 networks is that they allow another assessment of how feature importance changes as a function of indirect dependents. Whereas Experiment 1 showed that primary causes were not more heavily weighed when they had two indirect dependents versus one, in this experiment the primary cause F1 has three indirect dependents in the 1-1-3 network versus only one in the 1-1-1 networks. By carrying out a stronger manipulation, Experiment 3 will thus provide a more stringent test of CSH’s claim that feature weights increase with the number of indirect dependents. The predictions that CSH, RCH, and CMT make for the 1-1-3 network are presented in Figure 12. As usual, CSH predicts a monotonic decrease in feature weights for both networks. It also predicts that features F1 and F2 should both be more heavily weighed in the 1-1-3 network (because F1 has three indirect dependants in the 1-1-3 network vs. one in the 11-1 network, and because F2 has three direct dependants in the 1-1-3 network vs. one in the 11-1 network). As usual, RCH predicts that the intermediate feature (F2) should be most heavily weighed in both networks, because in both it is involved in the largest number of causal relationships. In addition, feature F2 should be more heavily weighed in the 1-1-3 condition (in which it is involved in four relationships) as compared to the 1-1-1 condition (in A Generative Theory of Classification 33 which it is involved in two). Finally, CMT predicts no differences in relative feature weights between the two networks, because feature F1, feature F2, and the peripheral features each have the same number of causes across the two conditions (zero, one, and one, respectively). Note the design of Experiment 3 also allows an alternative interpretation of the first two experiments to be addressed. In each of those experiments, the intermediate feature involved in four causal relations (F3 in Experiment 1, F4 in Experiment 2) was weighed more heavily than the other features. However, one simple explanation for this effect is that, because it participated in the most causal relations, the intermediate feature was mentioned in the initial description of the category more often than the other features. That is, rather than having anything to do with causal structure per se, the intermediate feature’s greater influence may have arisen from it being relatively salient from being presented repeatedly. Experiment 3 will address this possibility, because now it will be feature F2 that will be presented most frequently in the 1-1-3 condition (by virtue of its involvement in four causal relations), and which thus should be (on this account) most salient and weighed most heavily. On the basis of the results from Experiments 1 and 2, it is also expected that category membership ratings will reflect the pattern of interfeature correlations generated by the networks shown in Figure 11. In the 1-1-3 condition there are four feature pairs that are directly causally related (F1F2, F2F3, F2F4, and F2F5). In addition, in that network feature pairs F1F3, F1F4, and F1F5 should be indirectly correlated (because they are connected in a causal chain via F2), as should F3F4, F3F5, and F4F5 (because they have a common cause in F2). For each of 1-1-1 networks, there are two pairs that are directly linked and one that is indirectly correlated. The directly-linked and indirectly-uncorrelated feature pairs for each causal network tested in Experiment 3 are presented in Table 2. Method Materials. The materials used in Experiment 3 were identical to those used in Experiments 1 and 2, except for the causal relations required by the networks in Figure 11. Participants. Seventy-two New York University undergraduates received course credit A Generative Theory of Classification 34 for participating in this experiment. Design. The single within-subject manipulation was the topology of the category’s causal network (1-1-3 or 1-1-1). In addition, the same four between subject counterbalancing factors from the first two experiments were used here. Procedure. The procedure was identical to that of the first two experiments. Results The category membership ratings (presented in Appendix A) were once again analyzed by performing a multiple regression for each participant for each of his or her two networks. Once again, there was no effect of the four between-subject counterbalancing factors of theoretical significance, and the results for feature weights and interactions are presented collapsed over these factors. Feature weights. In Experiment 3 the peripheral features are F3, F4, and F5 in the 1-1-3 condition; in the 1-1-1 condition it is F3, F4, or F5. Feature F1 is the primary cause in all four of the networks in Figure 11. Regression weights averaged over participants for each feature type are presented for the 1-1-3 and 1-1-1 networks in Figures 13A and 13B, respectively. As predicted by CMT but not the other two models, the pattern of features did not differ across the two causal networks (Figure 13C). A 2x3 ANOVA revealed no interaction between feature and network, confirming that the pattern of features weights was unaffected by the topology of the causal network. There was, however, a main effect of feature, F(2, 142) = 12.57, MSE = 36.0, p < .0001, indicating that the three features were different from one another. The primary cause feature F1 was weight most heavily (7.9 and 8.8 in the 1-1-3 and 1-1-1 conditions, respectively) followed by F2 (5.1 and 6.1), and then the peripheral features (4.5 and 5.5). Single degree of freedom contrasts indicated that while features F1 and F2 were significantly different , F(1, 71) = 15.89, MSE = 202.2, p < .001, F2 and the peripheral features were not, F(1, 71) = 1.15, MSE = 202.2, p >.20. That is, while the feature weights in this experiment exhibited a significant primary cause effect (the first experiment in which they have done so), they did not exhibit a full causal status effect (i.e., a monotonic decrease A Generative Theory of Classification 35 in feature weights). Feature interactions. The two-way interactions were aggregated according to whether the features were directly linked or indirectly correlated according to the pattern of expected correlations in Table 2. The two types of interaction weights are presented in Figures 13D and 13E. As expected, in both conditions the interactions between both directly-linked and indirectly-correlated features were positive, confirming once again that categorization ratings were sensitive to whether expected interfeature correlations were preserved. The interactions between indirectly-correlated feature pairs were again less than the directly-linked pairs. In both conditions, each type of interaction was significantly greater than 0, p’s < .0001. In addition, a 2x2 repeated measures ANOVA with network and interaction type as factors revealed an effect of interaction type, F(1, 71) = 24.33, MSE = 3.71, p < .0001, confirming that directly-linked interactions were greater than indirectly-correlated interactions. The interaction between network and interaction type was not significant, F < 1. Discussion Once again, the predictions of CSH were not supported by this experiment. CSH predicted that features F1 and F2 should both be more heavily weighed in the 1-1-3 network than in the 1-1-1 network, because in the former network they have a greater number of dependents. In fact however, these features were not weighed relatively more heavily in the 11-3 network. On the positive side, CSH predicted the primary cause effect in which the primary cause feature F1 was weighed more heavily than the other features in both networks. However, CSH also incorrectly predicted that intermediate feature F2 should be weighed more heavily than the peripheral features. RCH also incorrectly predicted that feature F2 would be relatively more important in the 1-1-3 network (in which it is involved in four relations) as compared to the 1-1-1 network (in which it is involved in two). Note that this result contrasts with those from Experiments 1 and 2 which found that an intermediate feature was weighed more heavily when it was involved in four versus two relations. The difference is that whereas those experiments A Generative Theory of Classification 36 manipulated the intermediate feature’s number of causes, the current experiment only manipulated its number of effects. Apparently, a feature’s importance increases with its total number of causes rather than its total number of relations or total number of effects, that is, a multiple cause effect obtains. Of the three theories, only CMT correctly predicted the absence of any difference in the pattern of features weights between the 1-1-3 and 1-1-1 conditions, and the pattern of twoway interactions among features. Nevertheless, it is important to note that CMT’s a priori predictions for both the 1-1-3 and 1-1-1 networks was that primary causes should be least important (because they have zero causes) whereas in fact they were the most important feature in each network. This primary cause effect is theoretically important because it represents the only empirical effect in this study not predicted by CMT. Nevertheless, we have been deemphasizing this result because the features which instantiate the primary causes differ from the rest of the features in systematic ways, for example, in the order in which they are presented on the computer screen. Indeed, the influence of feature order is suggested by comparisons the three experiments: Why should a primary cause effect appear in Experiment 3's 1-1-1 condition whereas in the (supposedly equivalent) 1-1-1 conditions of the first two experiments it was either marginally significant (Experiment 1) or absent altogether (Experiment 2)? Our interpretation of this result is that in this experiment's 1-1-1 condition the primary cause was the most heavily weighed feature partly because it was always instantiated by F1, a feature which may have been especially salient because it always appeared first. In comparison, the primary cause was instantiated by feature F1 in only half of the 1-1-1 conditions of Experiment 1 and only a third of those conditions in Experiment 2. Is it possible to determine from the present results whether a primary cause effect in fact obtains? Our experiments were not designed to provide a decisive answer to this question, but one way to get a better sense of whether there really is a primary cause effect is to collapse the results over the three experiments in order to reduce the effects of feature order. A Generative Theory of Classification 37 The average 1-1-1 feature weights from the 192 participants tested in Experiments 1-3 are presented in Figure 14, and in fact this figure shows an overall elevated weight on the primary cause (7.6) as compared to the intermediate feature (6.4) and the peripheral feature (6.3). A one-way ANOVA on these data revealed an overall effect of feature type, F(2, 382) = 6.27, MSE = 17.9, p < .01; the primary cause was significantly different from the two other features, F(1, 191) = 8.04, MSE = 45.9, p < .01, while the intermediate feature and the peripheral feature did not differ, F < 1. Of course, the elevated weight on the primary cause in Figure 14 might still be due to feature order, because even over the three experiments the primary cause was instantiated more often by the salient feature F1 than any other feature. Nevertheless, these results essentially replicate those from Rehder (2003a) for a four-element causal chain presented earlier in Figure 2C, which found an elevated weight on the primary cause relative to a control condition which was identical except for the presence of the causal relations (and which thus controls for order effects). On balance then, evidence from past and current studies seem to support the claim that features indeed have increased influence on classification judgments when they are the first cause in a category's causal network. Theoretical Modeling The foregoing results demonstrates that CMT, when augmented with an assumption regarding the importance of primary causes, provides a complete account of all the results from Experiments 1-3. It can be demonstrated that it provides a good quantitative account as well. Appendix B reports the results of fitting CMT to each of the four causal networks tested in this research: 2-1-2, 3-1-1, 1-1-3, and 1-1-1. The appendix confirms that CMT reproduces all the major qualitative trends in the empirical data. First, it accounts for the larger weight for feature F3 in Experiment 1's 2-1-2 condition as compared to other 2-1-2 features (Figure 6A). It also accounts for the larger weight for feature F4 in Experiment 2's 3-1-1 condition (Figure 10A). CMT also reproduces the pattern of weights in the Experiment 3's 1-1-3 condition (Figure 13A) and those in the 1-1-1 condition from all three experiments (Figure 14). Finally, CMT reproduces the pattern of correlations between causally-linked features (the coherence A Generative Theory of Classification 38 effect), correctly predicting that ratings should be strongly sensitive to correlations between directly linked features, and less sensitive to correlations between indirectly connected features. When combined with model fitting results from prior studies presented in Figure 2, these new fits mean that CMT has been successfully fit to all seven of the network topologies which have undergone systematic test. It is important to emphasize once again, however, that the pattern of results in these experiments were not perfectly in accord with CMT's a priori predictions. CMT only achieves its good fits with the help of a free parameter (the c parameter) that allows the primary cause feature of each network to receive greater weight than was predicted beforehand. We will present a number of possible reasons for the presence of the primary cause effect in the General Discussion. We also fit CSH to the data from Experiments 1-3. Details of the model fitting procedure and results are provided in Appendix C. As expected, CSH is unable to account for many qualitative aspects of the data, including the greater weight on features F3 and F4 in the 2-1-2 and 3-1-1 networks, respectively, or the interactions between features. We did not fit RCH to these data, because no quantitative version of RCH has been specified. General Discussion The purpose of this study was to assess how interfeature causal relations affect classification. There were three main findings. The first was a multiple-cause effect: features increase in importance to the extent they have multiple causes rather than multiple effects or multiple relations. The second was a coherence effect: Good category members are those whose combinations of features are likely to have been generated by a category’s causal laws. Each of these results was predicted by CMT’s generative view of classification. The third was a primary cause effect: the initial features of a causal network will receive additional weight. We now describe each of these results in greater detail as we discuss the implications they have for each of the three theories. A Generative Theory of Classification 39 Causal Status Hypothesis The predictions of the causal status hypothesis (CSH) were generally not supported by the current experiments. One purpose of this study was to test CSH’s prediction that a feature’s importance increases as a function of its number of (direct or indirect) dependents. Its prediction regarding indirect dependents was tested in Experiments 1 and 2 which manipulated the primary causes' number of indirect dependants. In neither experiment was the relative importance of primary causes greater in those conditions with a larger number of indirect dependents. CSH's prediction regarding direct depenents was tested in Experiments 13. In Experiment 1 an intermediate feature was given one additional cause and one additional effect, in Experiment 2 it was given two additional causes, and in Experiment 3 it was given two additional effects. Whereas CSH predicted that the feature’s importance should have increased in Experiments 1 and 3 (in which it has additional effects), it only increased in Experiments 1 and 2 (when it had additional causes). That is, feature importance varied with the number of causes not the number of effects. CSH also predicted that the intermediate feature of a 1-1-1 network (which always had one dependent) should be weighed more heavily than the peripheral feature (which always had zero.) In fact, however, in all three experiments intermediate features were never weighed more heavily than peripheral features. On the one hand, this result corroborates the findings presented earlier for a four element causal chain (Figure 2C). On the other hand, there have been reports of monotonically decreasing feature weights elsewhere in the literature (Ahn, 1998; Ahn et al., 2000; Sloman et al. 1998), and thus it is important to consider reasons for the different conclusions between studies. Three possibilities are now discussed. Method of assessing feature weights. One difference between studies is the method used to assess feature weights. For example, in Ahn et al. (2000, Experiments 1 and 2) participants were instructed on categories with three features arranged in a causal chain (i.e., a 1-1-1 network), and then rated the category membership of an exemplar missing just the primary cause, one missing just the intermediate feature, and one missing just the peripheral A Generative Theory of Classification 40 feature. Exemplars missing the initial cause were rated lower than ones missing the intermediate feature, which in turn were lower than ones missing the peripheral feature, a result which was interpreted as providing evidence of monotonically decreasing feature weights. However, a drawback with this method for assessing feature weights is that it fails to control for interactions among features. For example, an exemplar missing the intermediate feature violates two expected correlations (with the primary cause and the peripheral feature) whereas one missing the peripheral feature violates only one. Indeed, in the current study as well, exemplars in the 1-1-1 condition missing only the intermediate feature had an average rating (56.3) which was significantly lower than those missing only the peripheral feature (61.7). But rather than concluding that intermediate features are more important than peripheral features, the regression analyses conducted here properly factor the influence of causal knowledge into the two independent effects of feature weight and feature interactions. Those analyses revealed that the difference between exemplars missing intermediate features and peripheral features occurs because of the number of violated correlations, not because of a difference in their weight. Feature frequency in natural categories. Evidence for monotonically decreasing weights has also been advanced by studies testing natural categories. However, as mentioned a problem with natural categories is that a feature’s role in a causal network is likely to be confounded with other variables. For example, features of natural categories have different diagnosticities or cue validities (the probability of the category given the feature) and different within-category frequencies or category validities (the probability of the feature given the category), and it is well known that these factors influence an exemplar’s degree of category membership (Hampton, 1998; Rosch et al., 1975). Features of natural categories which are causally peripheral may be considered less important because they are less diagnostic and/or less frequent amongst category members than more central features. Use of Domain Knowledge. Finally, another reason why some studies observe monotonically decreasing feature weights is that some of the materials may have induced A Generative Theory of Classification 41 participants to make use of prior domain knowledge. For example, one of the novel categories tested in Ahn et al. (2000) was a novel disease D with symptoms X, Y, and Z, in which X was described as the cause of Y, and Y was described as the cause of Z. (Fictitious medical terms were used rather than D, X, Y, and Z.) Although Ahn et al. assumed that the causal knowledge consisted of X→Y→Z, people understand that a disease causes its symptoms, so participants were likely to have assumed that D→X→Y→Z. Participants may then have reasoned backwards from the symptoms to the disease, and thus the causally more proximal intermediate symptom Y may have been taken to be more diagnostic than the more peripheral symptom Z. (This proposal is discussed at greater length below.) In summary then, we found no support for CSH’s principle that feature importance increases with the total number of (direct or indirect) dependents. It was also shown that feature weights do not invariably decrease monotonically along a causal chain. Finally of course, CSH also provides no account of the coherence effect. Nevertheless, it is important to note that one of CSH’s predictions—the primary cause effect—did receive empirical support. we discuss the primary cause effect at length below after first reviewing the evidence for and against RCH and CMT. Relational Centrality Hypothesis The relational centrality hypothesis (RCH) also generally fared poorly in the current set of experiments. The central principle of RCH is that feature importance should increase with the number of direct effects, but we failed to find any support for this claim. Experiments 1-3 manipulated an intermediate feature’s number of causes and effects, and whereas RCH predicted that the feature’s importance should have increased in all three experiments (because in all three it is involved in two additional relations), it increased only when it had additional causes (in Experiments 1 and 2) but not when it only had additional effects (Experiment 3). Not only was RCH alone inadequate, CSH/RCH also encountered numerous problems, because it inherits the failures of CSH and RCH. First, like CSH, CSH/RCH A Generative Theory of Classification 42 predicts that primary causes should increase in importance relative to the peripheral features when they have a larger number of indirect dependents, but this effect was observed in neither Experiment 1 nor 3. Second, because both CSH and RCH predicted that the intermediate feature should be relatively more important in the 1-1-3 network than in the 1-1-1 network in Experiment 3, CSH/RCH predicts an especially large advantage for the intermediate feature. In fact, however, the intermediate feature was not relatively more important in the 1-1-3 network. Finally, for the 1-1-1 network in all three experiments CSH/RCH predicts an especially large advantage for the intermediate feature (which has one dependent and two relations) over the peripheral feature (which has zero dependents and one relation), but the results consistently showed no advantage for the intermediate feature in that network. In summary then, no support was found for RCH’s principle that feature importance increases with the total number of relations. Of course, besides being unable to explain the multiple cause effect, RCH also provides no explanation of the primary cause effect, and CSH/RCH provides no explanation of the coherence effect. As accounts of the effect of causal knowledge on categorization, neither RCH nor CSH/RCH have much to be said for them. Causal Model Theory With one notable exception, the predictions of CMT were confirmed by the current experiments. First, an important success for CMT was that it uniquely provided an account of the coherence effect, the fact that combinations of features make for better or worse category members. CMT makes the specific prediction that good category members are those that exhibit consistency between both features which are directly causally related and those which should be indirectly correlated in the category’s causal network. In fact, all three experiments showed that CMT correctly predicted the complex pattern of interactions between features: positive two-way interactions between directly-linked features and smaller interactions between indirectly correlated features. Overall, these experiments provided strong support for CMT’s claim that good category members are those whose features were likely to be generated by the category’s causal laws. A Generative Theory of Classification 43 To put the importance of the coherence effect in perspective, it is illuminating to consider what proportion of the variance in categorization ratings induced by causal knowledge could be attributed to the coherence effect. This study has demonstrated how regression analyses can be used to separately assess changes in feature weights from those to two-way feature interactions, and thus the proportion of variance explained by these two orthogonal sets of predictors can be calculated. In fact, 30% of the variance was attributable to changes in feature weights and the remaining 70% to changes in the interaction terms. In other words, the most important effect brought about by causal knowledge is to make various combinations of features appear more or less coherent with respect to a set of causal laws. Relative to this, changes to the importance of individual features is secondary. Other studies have demonstrated the importance of coherence to classification. For example, Wisniewski (1995) found that certain artifacts were better examples of the category “captures animals” when they possessed certain combinations of features (e.g., “contains peanuts” and “caught a squirrel”) but not others (“contains acorns” and “caught an elephant”). Similarly, Rehder and Ross (2001b) showed that artifacts were considered better examples of a category of pollution cleaning devices when their features cohered (e.g., “has a metal pole with a sharpened end” and “works to gather discarded paper”), and worse examples when their features were incoherent (“has a magnet” and “removes mosquitoes”). Coherence affects other types of category-related judgments as well. Rehder and Hastie (2004) found that participants’ willingness to generalize a novel property displayed by an exemplar to an entire category varied as a function of the exemplar’s coherence. Maximally coherent exemplars which satisfied all of category’s causal laws (items we referred to as theoretical ideals) supported the strongest generalizations. Finally, there are numerous studies demonstrating theoretical knowledge that links category features alters the manner in which how categories are learned, both when the learning is supervised (Murphy & Allopenna, 1994; Rehder & Ross, 2001; Waldmann et al. 1995; Wattenmaker et al., 1986) and unsupervised (Ahn & Medin, 1992; Kaplan & Murphy, 1999; Medin, Wattenmaker, & Hampson, 1987). A Generative Theory of Classification 44 Of course, in addition to the coherence effect causal knowledge did induce changes to the importance of individual features, and another source of evidence in favor of CMT was the multiple cause effect which manifested itself throughout these experiments. CMT correctly predicted that an intermediate feature’s importance would increase when it had additional causes (in Experiments 1 and 2) but not when it had only additional effects (in Experiment 3). These new results add to those for a common effect network (Figure 1B) in which a peripheral feature was the most heavily weighed when it had three causes which generated it. Thus, support for the multiple cause effect has been found with multiple network topologies across multiple studies. Finally, formal model fitting also demonstrated that CMT was able to provide generally good quantitative fits to the four new network topologies tested here. In summary then, CMT received considerable support from the current experiments, as its generative view of classification accounted for both the coherence effect and the multiple cause effect. Nevertheless, despite its good quantitative fits there was one important result which CMT failed to predict on an a priori basis: the primary cause effect. The Primary Cause Effect The third empirical result was the primary cause effect in which features which are initial causes in a category's causal network have an inflated influence on categorization judgments. The evidence in favor of the primary cause effect provided here is not as conclusive as it might be, because these experiments were not designed to make strong inferences on the basis of within-network comparisons (because such comparisons involved features which also varied in their presentation order). Nevertheless, other studies have demonstrated a primary cause effect relative to a control condition which controlled for feature order (Ahn et al., 2000; Rehder 2003a; Rehder & Hastie, 2001). Thus, taken in their entirety, past and current studies provide considerable support for the primary cause effect. Why should a primary cause have greater influence on classification? Although this effect was predicted by CSH, it does so on the basis of primary causes having the largest A Generative Theory of Classification 45 number of dependents, but, as we have seen, features do not in general increase in importance with the number of dependents, and thus we must look elsewhere for an explanation. Two potential rationales for the primary cause are discussed. Primary Causes as Proto-Essences. One reason is that a primary cause may start to take on some of the characteristics of a defining or essential feature for the category. According to the principle of psychological essentialism, a category essence occurs in all members of a category and in members of no other category, and makes an object the kind of thing it is. Moreover, a category’s essence is presumed to generate many of the observable properties of kinds (Gelman, 2003; Medin & Ortony, 1989). On this view, because a primary cause has one characteristic of an essence (it causes, directly or indirectly, the rest of the features), it begins to take on another characteristic, greater importance to category membership (Rehder & Hastie, 2001). Of course this effect was not so extreme that the primary cause becomes a true defining property for the category because classification was also influenced by presence or absence of other features (and by whether the entire set of features was mutually coherent). Primary Causes as Reliable Diagnostic Cues. The second rationale also appeals to essentialism, but rather than identifying the essence with one of the known category features, it claims that classifiers presume the additional presence of an essence which is the ultimate cause of those features. It is clear that for some categories people have explicit knowledge of an underlying essence, such as when a disease produces a chain of observable symptoms. For such categories, classifiers reason causally backwards from symptoms to the disease, and this inference is presumably made with greater confidence for symptoms which are directly generated by that disease. But even when explicit knowledge about the nature of an essence is absent, research suggests that people assume the existence of an underlying defining cause nonetheless. For example, even young children view many natural kinds as being defined by underlying properties or characteristics (Gelman, 2003; Keil, 1989; Rips, 1989). Likewise, there is evidence that the essential feature of artifacts is the causal force responsible for their A Generative Theory of Classification 46 existence, namely, the intentions of their designer (Bloom, 1998; Keil, 1995; Matan & Carey, 2001; Rips, 1989). These findings suggest that it is likely that the novel kinds and artifacts tested in the current study were viewed as having a hidden and defining cause, and raises the possibility a primary cause was weighed more heavily because it was viewed as being more reliably diagnostic of that hidden cause. This view that classification can sometimes be seen as a case of diagnostic reasoning is readily explicable in terms of CMT’s generative view of classification, so long as the category’s causal model is augmented with the underlying defining cause. For example, Figure 15 presents the probability that the features of a 1-1-1 network will be generated when that model also includes a hidden cause which is present in all category members and which directly causes the primary cause. The power of the causal mechanism linking features (parameter m) is varied from 0 (no causal link) to an intermediate value (a probabilistic cause) to 1 (a deterministic cause). Figure 15 shows that when the causal mechanisms are probabilistic, the primary cause is more probable than any other feature, because it is directly generated by the underlying cause. And of course according to CMT, a feature which is generated with greater probability is one that is also more diagnostic of category membership all else being equal. That is, CMT’s generative view of classification subsumes the causal reasoning which takes place when one reasons diagnostically from observables to a hidden defining or essential property (Rehder, in press-a). Of course, the diagnostic view also predicts that intermediate features in a causal chain should be more diagnostic than peripheral features, a view illustrated in Figure 15 by the fact that intermediate features are more probable than the peripheral features. That is, like CSH, it predicts a full causal status effect rather than just a primary cause effect. However, it is noteworthy that Figure 15 shows at most only a small probability difference between the intermediate and peripheral features, suggesting that the failure to observe a full causal status effect in many empirical studies (including this one) may be due to its small size. Overall, the similarity between the theoretical predictions in Figure 15 for probabilistic causes and the A Generative Theory of Classification 47 empirical regression weights for a 1-1-1 network in Figure 14 is striking. Further research will be required to determine whether the inflated influence of primary causes is due to them being treated as essences, as reliable diagnostic cues, or both. Regarding the diagnostic view at least, there is some additional evidence which can be advanced in its support. First, Rehder (2003b, Experiment 3) instructed participants on a category with three features related in a causal chain plus an unobserved defining feature which was the cause of the primary cause, and found an elevated weight on the primary cause (and a weight on the intermediate feature which was marginally higher than on the peripheral feature). In other words, an underlying defining cause is sufficient to the pattern of results shown in Figures 14 and 15. Second, Rehder and Burnett (2005) tested how people use causal knowledge to infer the value of unobserved features in category members, and found that inferences could only be fully explained by assuming that the categories were viewed as possessing underlying causal mechanisms. Thus, there appears to be a growing body of evidence suggesting that categories are thought to be organized around underlying hidden causes which influence how categories are used in a variety of inferential tasks. Summary There were three findings regarding how interfeature causal relations affect classification. The first was a multiple cause effect in which a feature’s importance increases with its number of causes rather than its number of effects or causal relations. The second was a coherence effect in which good category members are those whose features jointly corroborate the category’s causal knowledge. These two effects can be accounted for by assuming that good category members are those likely to be generated by a category’s causal laws. The third finding was a primary cause effect in which primary causes become more influential in judgments of category membership. This result can also be accounted by a generative account by making one additional assumption: that categories are organized around hidden generative causes. A Generative Theory of Classification48 ReferencesAhn, W. (1998). Why are different features central for natural kinds and artifacts? Therole of causal status in determining feature centrality. Cognition, 69, 135-178.Ahn, W., Kim, N. S., Lassaline, M. E., & Dennis, M. J. (2000). Causal status as adeterminant of feature centrality. Cognitive Psychology, 41, 361-416.Ahn, W., & Medin, D. L. (1992). A two-stage model of category construction.Cognitive Science, 16, 81-121.Bloom, P. (1998). Theories of artifact categorization. Cognition, 66, 87-93.Gelman, S. A. (2003). The essential child: The origins of essentialism in everydaythought. New York: Oxford University Press.Hadjichristidis, C., Sloman, S. A., Stevenson, R., & Over, D. (2004). Featurecentrality and property induction. Cognitive Science, 28(1), 45-74.Hampton, J. A. (1998). Similarity-based categorization and fuzziness of naturalcategories. Cognition, 65, 137-165.Heit, E., & Rubinstein, J. (1994). Similarity and property effects in inductivereasoning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20(2),411-422.Kaplan, A. S., & Murphy, G. L. (1999). The acquisition of category structure inunsupervised learning. Memory & Cognition, 27(4), 699-712.Keil, F. C. (1989). Concepts, kinds, and cognitive development. Cambridge, MA: MITPress.Keil, F. C. (1995). The growth of causal understandings of natural kinds. In D.Sperber, D. Premack & A. J. Premack (Eds.), Causal cognition: A multidisciplinary approach(pp. 234-262). Oxford: Clarendon Press.Kim, N. S., & Ahn, N. S. (2002). Clinical psychologists' theory-based representationof mental disorders affect their diagnostic reasoning and memory. Journal of ExperiementalPsychology: General, 131(4), 451-476. A Generative Theory of Classification49 Lien, Y., & Cheng, P. W. (2000). Distinguishing genuine from spurious causes: Acoherence hypothesis. Cognitive Psychology, 40, 87-137.Matan, A., & Carey, S. (2001). Developmental changes within the core of artifactconcepts. Cognition, 78, 1-26.Medin, D. L., Coley, J. D., Storms, G., & Hayes, B. K. (2003). A relevance theory ofinduction. Psychonomic Bulletin & Review, 10(3), 517-532.Medin, D. L., & Ortony, A. (1989). Psychological essentialism. In S. Vosniadou & A.Ortony (Eds.), Similarity and analogical reasoning (pp. 179-196). Cambridge, MA:Cambridge University Press.Medin, D. L., & Schaffer, M. M. (1978). Context theory of classification learning.Psychological Review, 85, 207-238.Medin, D. L., Wattenmaker, W. D., & Hampson, S. E. (1987). Family resemblance,conceptual cohesiveness, and category construction. Cognitive Psychology, 19, 242-279.Murphy, G. L., & Allopenna, P. D. (1994). The locus of knowledge effects in conceptlearning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20(4), 904-919.Nosofsky, R. M. (1986). Attention, similarity, and the identification-categorizationrelationship. Journal of Experimental Psychology, 115(1), 39-57.Pazzani, M. (1991). A computational theory of learning causal relationships. Cognitive Science, 15, 401-424.Rehder, B. (2003a). Categorization as causal reasoning. Cognitive Science, 27(5), 709-748.Rehder, B. (2003b). A causal-model theory of conceptual representation andcategorization. Journal of Experimental Psychology: Learning, Memory, and Cognition,29(6), 1141-1159.Rehder, B. (in press-a). Essentialism as a generative theory of classification. In A.Gopnik & L. Schultz (Eds.), Causal learning: Psychology, philosophy, and computation. A Generative Theory of Classification50 Rehder, B. (in press-b). When similarity and causality compete in category-basedproperty induction. Memory & Cognition.Rehder, B., & Burnett, R. C. (2005). Feature inference and the causal structure ofobject categories. Cognitive Psychology, 50(3), 264-314.Rehder, B., & Hastie, R. (2001a). Causal knowledge and categories: The effects ofcausal beliefs on categorization, induction, and similarity. Journal of ExperimentalPsychology: General, 130(3), 323-360.Rehder, B., & Hastie, R. (2004). Category coherence and category-based propertyinduction. Cognition, 91(2), 113-153.Rehder, B., & Ross, B. H. (2001b). Abstract coherent categories. Journal ofExperimental Psychology: Learning, Memory, and Cognition, 27(5), 1261-1275.Rips, L. J. (1989). Similarity, typicality, and categorization. In S. Vosniadou & A.Ortony (Eds.), Similarity and analogical reasoning (pp. 21-59). New York: CambridgeUniversity Press.Rosch, E. H., & Mervis, C. B. (1975). Family resemblance: Studies in the internalstructure of categories. Cognitive Psychology, 7, 573-605.Sloman, S. A. (1994). When explanations compete: The role of explanatory coherenceon judgments of likelihood. Cognition, 52, 1-21.Sloman, S. A., Love, B. C., & Ahn, W. (1998). Feature centrality and conceptual coherence. Cognitive Science, 22(2), 189-228.Waldmann, M. R., Holyoak, K. J., & Fratianne, A. (1995). Causal models and theacquisition of category structure. Journal of Experimental Psychology: General, 124(2), 181-206.Wattenmaker, W. D., Dewey, G. I., Murphy, T. D., & Medin, D. L. (1986). Linearseparability and concept learning: Context, relational properties, and concept naturalness.Cognitive Psychology, 18, 158-194.Wisniewski, E. J. (1995). Prior knowledge and functionally relevant features in A Generative Theory of Classification51 concept learning. Journal of Experimental Psychology: Learning, Memory, and Cognition,21(2), 449-468. A Generative Theory of Classification52 Appendix AEmpirical Results from Experiments 1-3The average classification ratings for each of the 32 test exemplars in each conditionof Experiments 1, 2, 3 are presented in Tables A1, A2, and A3, respectively. The tables alsoinclude predicted ratings from fits of CMT and CSH, as discussed in Appendices B and C. A Generative Theory of Classification53 Appendix BFitting of Causal Model Theory (CMT)According to CMT, knowledge of a category consists of a causal model whichincludes the category's features and the probabilistic causal mechanisms among thosefeatures. For example, Figure B1 presents a rendition of Experiment 1's 2-1-2 network inwhich the mechanisms between causally related features are depicted as diamonds. Accordingto this formalization, when feature Fi is present it enables the operation of mechanism Mijsuch that Mij will, with some probability, bring about the presence of Fj. When Fi is absent, theoperation of Mij is disabled and has no influence on Fj. An important characteristic of thisformalization is that it reflects our intuitive understanding of the asymmetry of causalrelations, because it is sensitive to which values of the binary features Fi and Fj are called“present” and which are called “absent,” and to which feature is called the cause and whichthe effect.Figure B1 also shows the parameters associated with the 2-1-2 causal model.Parameter mij is the probability that the causal mechanism Mij will successfully operate (i.e.,will bring about the presence of Fj) when Fi is present. Parameter bj is the probability that Fjwill be present even when it is not brought about by Fi, and can interpreted as the probabilitythat Fj is caused by some unspecified background cause. Finally, parameter ci is theprobability that a primary cause feature Fi will be present. The other causal networks (1-1-1,3-1-1, and 1-1-3) can be formalized in an analogous manner.The central prediction of CMT is that categorizers make classification decisions byestimating how likely an exemplar is to have been generated by a category's causal model.The likelihood that the model will generate the exemplar's particular set of features can beexpressed as a function of the model’s parameters. For example, consider the probability thatthe 2-1-2 network generates exemplar 00101 (i.e., F3 and F5 present, F1, F2, and F4 absent).This probability can be computed by multiplying the probabilities associated with theindividual features. The probability that F1 is absent is (1 – c1), and the probability that F2 is A Generative Theory of Classification54 absent is (1 – c2). The probability that F3 is present is b3, because F3 could only have beenbrought about by its background cause if neither F1 nor F2 are present (and thus neithermechanism M13 nor mechanism M23 operated). The probability that F4 is absent is (1 – m34)(1– b4), because neither its background cause nor M34 operated. Finally, the probability that F5 ispresent is (1 – (1 – m35)(1 – b5)), that is, 1 minus the probability that F5 is absent. Multiplyingthese individual probabilities yields,P2-1-2(00101) = (1 – c1)(1 – c2)(b3)(1 – m34)(1 – b4)(1 – (1 – m35)(1 – b5))As a second example, consider the probability that the 2-1-2 network generatesexemplar 11010. The probability that F1 is present is c1, and the probability that F2 is presentis c2. The probability that F3 is absent is (1 – m13)(1 – m23)(1 – b3), because its absence meansthat neither M13, M23, nor its background cause operated. The probability that F4 is present isb4, because, given the absence of F3, it only could have been brought about by its backgroundcause. Finally, the probability that F5 is absent is (1 – b5). Multiplying these individualprobabilities yields,P2-1-2(11010) = c1c2(1 – m13)(1 – m23)(1 – b3)(b4)(1 – b5).The probabilities of each of the possible 32 exemplars that can be formed on fivebinary dimensions can be generated in a similar manner, and are presented in Table B1 for the2-1-2 network. The likelihood equations for the 3-1-1, 1-1-3, and 1-1-1 can be generated byapplying the same basic principles (and are available from the authors upon request). CMT was fit to each participant’s categorization ratings for each of their twocategories. Note that the model fits assumed that all m parameters were equal to one another,as were all the c parameter and all the b parameters. For example, for the 2-1-2 network inFigure B1 it was assumed that m13 = m23 = m34 = m35 = m, that c1 = c2 = c, and that b3 = b4 = b5= b. The assumption that all m parameters are equal follows from the fact that many of thecausal relationships were pretested to be of equal plausibility, and thus there would be noreason to expect participants to assign them different strengths. Similarly, the features of eachcategory were novel and unfamiliar, and thus no reason to expect that they would have A Generative Theory of Classification55 different c and b parameter. (Equated parameters were also used in Rehder, 2003a; 2003b,which showed that separate c, m, and b parameters did not yield significantly better fits.) Theparameters were similarly collapsed in the 3-1-1, 1-1-3, and 1-1-1 networks.Each participant’s categorization ratings for a given network were predicted with thefollowing equation:Rating (E) = K Pnet (E; c, m, b)where E is the particular exemplar being rated, net is the category’s network (2-1-2, 3-1-1,etc.), Pnet is the likelihood that E was generated by that network as a function of c, m, and b,and K is a constant which scales CMT’s probabilities into the 0-100 rating scale. For each ofthe two categories presented to a participant, the values of K, c, m, and b that minimized thesquared deviation between category membership ratings for the 32 data points and predictedvalues was computed.The best fitting values for parameters K, c, m, and b averaged over participants arepresented in Table B2 for all three experiments; the observed and predicted categorymembership ratings for each condition in Experiments 1, 2, and 3 are presented in Tables A1,A2, and A3, respectively. To assess the fits of CMT in terms of feature weights andinteractions between features, the ratings predicted by CMT for each participant weresubjected to the same multiple regression that was performed on the observed ratings. Theresulting regression weights averaged over participants for each condition are presented in Figures B2 and B3 superimposed on the weights from the observed data. (Note that the resultsfrom the three 1-1-1 conditions of Experiments 1-3 have been collapsed together in FiguresB3B and B3D. Figures B2 and B3 also include the fits of CSH, discussed in Appendix C.)These figures confirm that CMT reproduces all the major qualitative trends in the data.First, CMT accounts for the larger weight for feature F3 in the 2-1-2 network as compared toother 2-1-2 features. It also accounts for the larger weight for feature F4 the 3-1-1 network.CMT makes these each of these predictions because these features are more likely to begenerated when because they have multiple causes. Finally, CMT also reproduces the pattern A Generative Theory of Classification56 of weights in the 1-1-3 and 1-1-1 networks.Second, and equally importantly, for each network CMT also correctly reproduces thepattern of correlations between causally-linked features, correctly predicting that ratingsshould be strongly sensitive to correlations between directly linked features, and less sensitiveto correlations between indirectly connected features.Overall then, quantitative model fitting of CMT to the data of Experiments 1-3showed that it was able to account for the pattern of both feature weights and featureinteractions in the two conditions. Some discrepancies between the observed and predictedregression weights are worth noting. First, for the 2-1-2 network CMT underpredicts theweight associated with F3 and overpredicts the interaction weight associated with directlylinked feature pairs. One way that CMT could produce a higher weight for F3 is by increasingthe value of the m parameter, because increasing the strength of the probabilistic causalmechanisms M13 and M23 would make F3 more likely to be generated, increasing its regressionweight. However, increasing m would also increase CMT’s weight on directly linked featurepairs, which are already too high. CMT's fits can be seen as a compromise between these twofacts. Second, for the 3-1-1 network CMT overpredicts feature F4 and underpredicts theperipheral feature. One way that CMT could produce a higher weight for the peripheralfeature is by increasing the weight on F4 (if F4 was more prevalent, then so too would be theperipheral feature), but the weight on F4 is already too high. Finally, for the 1-1-1 networkCMT tends to underpredict the magnitude of the interaction weights. Nevertheless, FiguresB2 and B3 shows that CMT reproduces all the major qualitative trends in the data. A Generative Theory of Classification57 Appendix CFitting of Causal Status Hypothesis (CSH)The central assumption of CSH is that classification decisions are made by a weightedsum of features, where the features' weights (or centralities) are determined their position inthe causal network. Recall that the centrality of a feature is given by the iterative equation(Sloman et al., 1998):ci,t+1 = ∑dijcj,t(1)where ci,t is the centrality of a feature i at time t and dij is the dependency strength betweenfeatures i and j. Because many of the causal relationships in the experiments were pretested tohave equal strength, the ds were assumed to be equal (the same assumption was made for them parameters in the CMT fits). The iterative equation is guaranteed to converge in a smallnumber of steps, and can be used to derive analytical expressions for the weight of eachfeature in any given network. For example, for the 2-1-2 network of Experiment 1, theweights of the primary causes, the intermediate feature, and the peripheral features are2dc0,2dc0, and c0, respectively.Each participant’s categorization ratings for a given network were predicted with thefollowing equation:Rating(E) = K(∑icifi) + b(2)where E is the particular exemplar being rated, ci is the centrality of feature i, fi codes thepresence or absence of feature i in E (1 or 0), and b and K are parameters which translate (b)and scale (K) CSH's predictions onto the 0-100 rating scale. As described above, each ci is afunction of d and c0; however, c0 is absorbed by K and thus is dropped as a free parameter.Parameter d was constrained to be ≥ 1, because when d is less than 1 effect features canbecome more central than cause features, contradicting CSH’s main assumption. Consistentwith their role as a translation and scale parameters, b was constrained to the range [0, 100]and K was constrained to be positive.For each of the two categories presented to a participant, the values of d, K, and b that A Generative Theory of Classification58 minimized the squared deviation between category membership ratings for the 32 data pointsand predicted values was computed. The best fitting parameters averaged over participants arepresented in Table C1; CSH's predicted category membership ratings are included in TablesA1, A2, and A3, respectively. To assess the fits of CSH in terms of feature weights andinteractions between features, the ratings predicted by CSH for each participant weresubjected to the same multiple regression that was performed on the observed ratings. Theresulting weights averaged over participants for each condition are presented in Figures B2and B3 superimposed on the weights from the observed data and the predictions of CMT.These figures confirm the fact that CSH is unable to account for many qualitativetrends in the data. First, it is unable to account for the larger weight on F3 in the 2-1-2 network(Figure B2A). Second, it is unable to account for the larger weight on F4 in the 3-1-1 network(Figure B2B). Third, although it is reproduces that relative ordering of features in the 1-1-3network (Figure B3A), it drastically overestimates the weight on F2, and underestimates theweight on the peripheral features. On the positive side, CSH does provide a good fits to theweights for the 1-1-1 network, as expected (Figure B3B). However, and most critically, CSHpredicts zero interaction weights for all four networks (Figure B2C, B2D, B3C, and B3D).Note that this final result is necessitated by the fact that CSH's predicted ratings are aweighted sum of features (Eq. 2) and thus cannot, in principle, provide an account of thecoherence effect. The difficulty that CSH experiences in fitting these data is also reflected in itsparameter estimates. For CSH, relative feature weights are determined by the dependencyparameter d. When d is 1, there is no difference in centrality between a cause and effectfeature. In fact, in 201 out of the 384 fits (two fits for each of the 192 participants) the optimalvalue of d was 1. Even when all the three parameters (d, b, and K) were allowed to take anyvalue, 192 out of 384 d's were smaller than 1. However, these estimates for d violate theprimary theoretical tenet of CSH: Cause features are more important than effects.Finally, CSH's qualitative failures are reflected in the fact that in every condition of A Generative Theory of Classification59 every experiment, CMT achieved a smaller SSE than CSH (Tables B2 and C1). Thisadvantage for CMT also obtains using a measure of degree of fit (RMSD) which corrects forthe fact that CMT has greater number of parameters (4) as compared to CSH (3) (RMSD =SQRT (SSE / (32 – p)), p = the number of parameters). Again, CSH's poor fits are not due tothe constraints we placed on d, K, and b: even when those parameters were allowed to varyfreely, CSH achieves an overall average RMSD of 16.3 as compared CMT's 14.6. A Generative Theory of Classification60

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

How causal knowledge affects classification: A generative theory of categorization.

Several theories have been proposed regarding how causal relations among features of objects affect how those objects are classified. The assumptions of these theories were tested in 3 experiments that manipulated the causal knowledge associated with novel categories. There were 3 results. The 1st was a multiple cause effect in which a feature's importance increases with its number of causes. T...

متن کامل

Running Head: Children’s Causal Categorization Causal Categorization in Children and Adults

Two experiments examined the impact of causal relations between features on categorization by adults and 5-6-year-old children. Participants learned about artificial categories containing instances with two causally related features and two non-causal features. They then selected the most likely category member from a series of novel test pairs. Classification patterns and logistic regression w...

متن کامل

The Development of Causal Categorization

Two experiments examined the impact of causal relations between features on categorization in 5- to 6-year-old children and adults. Participants learned artificial categories containing instances with causally related features and noncausal features. They then selected the most likely category member from a series of novel test pairs. Classification patterns and logistic regression were used to...

متن کامل

Dynamic Categorization of Semantics of Fashion Language: A Memetic Approach

Categories are not invariant. This paper attempts to explore the dynamic nature of semantic category, in particular, that of fashion language, based on the cognitive theory of Dawkins’ memetics, a new theory of cultural evolution. Semantic attributes of linguistic memes decrease or proliferate in replication and spreading, which involves a dynamic development of semantic category. More specific...

متن کامل

Exploiting Unlabelled Data for Hybrid Object Classification

We propose a semi-supervised learning algorithm for visual object categorization which utilizes statistical information from unlabelled data to increase classification performance. We build on an earlier hybrid generative-discriminative approach by Holub et al. [6] which extracts Fisher scores from generative models. The hybrid model allows us to combine the modelling power and flexibility of g...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006